Inadequacies of minimum spanning trees in molecular epidemiology.

نویسندگان

  • Stephen J Salipante
  • Barry G Hall
چکیده

Minimum spanning trees (MSTs) are frequently used in molecular epidemiology research to estimate relationships among individual strains or isolates. Nevertheless, there are significant caveats to MST algorithms that have been largely ignored in molecular epidemiology studies and that have the potential to confound or alter the interpretation of the results of those analyses. Specifically, (i) presenting a single, arbitrarily selected MST illustrates only one of potentially many equally optimal solutions, and (ii) statistical metrics are not used to assess the credibility of MST estimations. Here, we survey published MSTs previously used to infer microbial population structure in order to determine the effect of these factors. We propose a technique to estimate the number of alternative MSTs for a data set and find that multiple MSTs exist for each case in our survey. By implementing a bootstrapping metric to evaluate the reliability of alternative MST solutions, we discover that they encompass a wide range of credibility values. On the basis of these observations, we conclude that current approaches to studying population structure using MSTs are inadequate. We instead propose a systematic approach to MST estimation that bases analyses on the optimal computation of an input distance matrix, provides information about the number and configurations of alternative MSTs, and allows identification of the most credible MST or MSTs by using a bootstrapping metric. It is our hope this algorithm will become the new "gold standard" approach for analyzing MSTs for molecular epidemiology so that this generally useful computational approach can be used informatively and to its full potential.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal Self-healing of Smart Distribution Grids Based on Spanning Trees to Improve System Reliability

In this paper, a self-healing approach for smart distribution network is presented based on Graph theory and cut sets. In the proposed Graph theory based approach, the upstream grid and all the existing microgrids are modeled as a common node after fault occurrence. Thereafter, the maneuvering lines which are in the cut sets are selected as the recovery path for alternatives networks by making ...

متن کامل

Counting the number of spanning trees of graphs

A spanning tree of graph G is a spanning subgraph of G that is a tree. In this paper, we focus our attention on (n,m) graphs, where m = n, n + 1, n + 2, n+3 and n + 4. We also determine some coefficients of the Laplacian characteristic polynomial of fullerene graphs.

متن کامل

On relation between the Kirchhoff index and number of spanning trees of graph

Let $G=(V,E)$, $V={1,2,ldots,n}$, $E={e_1,e_2,ldots,e_m}$,be a simple connected graph, with sequence of vertex degrees$Delta =d_1geq d_2geqcdotsgeq d_n=delta >0$ and Laplacian eigenvalues$mu_1geq mu_2geqcdotsgeqmu_{n-1}>mu_n=0$. Denote by $Kf(G)=nsum_{i=1}^{n-1}frac{1}{mu_i}$ and $t=t(G)=frac 1n prod_{i=1}^{n-1} mu_i$ the Kirchhoff index and number of spanning tree...

متن کامل

NUMBER OF SPANNING TREES FOR DIFFERENT PRODUCT GRAPHS

In this paper simple formulae are derived for calculating the number of spanning trees of different product graphs. The products considered in here consists of Cartesian, strong Cartesian, direct, Lexicographic and double graph. For this purpose, the Laplacian matrices of these product graphs are used. Form some of these products simple formulae are derived and whenever direct formulation was n...

متن کامل

Providing a Simple Method for the Calculation of the Source and Target Reliabili- ty in a Communication Network (SAT)

The source and target reliability in SAT network is de- fined as the flawless transmission from the source node to all the other nodes. In some references, the SAT pro- cess has been followed between all the node pairs but it is very time-consuming in today’s widespread networks and involves many costs. In this article, a method has been proposed to compare the reliability in complex networks b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of clinical microbiology

دوره 49 10  شماره 

صفحات  -

تاریخ انتشار 2011